Title: MIT Catalog Holdings with 1925 Publication Year
Creator: Sadie Roosa (sroosa@mit.edu), with special thanks to Beth Brennan, Christine Malinowski, Amy Nurnberger, and the MIT Libraries Rights Working Group (rwg@mit.edu).
Dates: Data exported from MIT Catalog in June 2020, cleaned in December 2020, and released in January 2021.
Language: Primarily English, other languages included in individual catalog entries

License: CC0 (https://creativecommons.org/share-your-work/public-domain/cc0/)
Recommended citation: MIT Libraries. (2021). MIT Catalog Holdings with 1925 Publication Year [date files and documentation].

Methodology:
Data was exported from the MIT Libraries' Aleph system (the original export file is included here as 1925_original.xlsx). This represents all records in Aleph that have 1925 in the publication date field. 
Data was loaded into Open Refine (https://openrefine.org/). Columns were split on various delimiters to break out MARC subfield values into separate columns. Additional cleaning, such as removing trailing punctuation and combining multivalued fields using a | delimiter were carried out. The exact processes conducted can be seen and reused in the Open Refine script file (open_refine_scripts.txt). Additionally some individual fields were changed to correct errors in the original catalog record, such as having a date in the location field.
Data was exported from Open Refine in a TSV format, and is presented here as 1925_cleaned.tsv

Files:
1925_cleaned.tsv - cleaned version of catalog data, split into columns defined in data dictionary, one catalog record per row
1925_original.xlsx - original file exported from MIT Libraries' Aleph system including all records with 1925 in the publication datat field in June 2020, one catalog record per row
data_dictionary - table of column headers from 1925_cleaned.tsv with definitions of each term, and notes about formatting of values
open_refined_scripts.txt - text file of the JSON exported from Open Refine showing all of the processes conducting in transformiong 1925_original.xlsx into 1925_cleaned.tsv
README.txt - this document




